Integration Capabilities

9. Integration Capabilities

9.1 Data Source Integration

Data sources provide assistants with access to indexed knowledge bases, code repositories, documentation, and other structured information. Assistants can automatically search and retrieve relevant information from connected data sources.

How Data Source Integration Works

When a datasource is attached to an assistant:

Automatic tool availability: Assistant gains access to knowledge base search tools
Semantic search: Assistant can search datasource content using natural language
Context retrieval: Relevant documents/code are retrieved and provided to the LLM
RAG pattern: Retrieval-Augmented Generation for accurate, grounded responses

Supported Data Sources:

Code repositories (Git): Source code, documentation, README files
Confluence pages: Wiki content, documentation, team knowledge
Jira issues: Tickets, bugs, feature requests, project data
Google Docs: Documents, spreadsheets, presentations
File uploads: PDF, Word, Text files, images
Custom integrations: API-based data sources

Data Source Integration Examples

Example 1: Code Repository Analysis Workflow

enable_summarization_node: false
recursion_limit: 30

assistants:
  - id: code-expert
    model: gpt-4.1
    system_prompt: |
      You are a senior software engineer with deep knowledge of the codebase.
      Use the code repository datasource to find relevant code and documentation.
      Always reference specific files and line numbers.
    datasource_ids:
      - main-codebase-repo # Git repository indexed as datasource
      - api-documentation # API docs indexed separately
    # Automatic tools available:
    # - search_knowledge_base: Semantic search across code and docs
    # - get_document_by_id: Retrieve specific files

states:
  - id: analyze-bug-report
    assistant_id: code-expert
    task: |
      Analyze this bug report and identify the root cause:
      {{bug_description}}

      Steps:
      1. Search the codebase for relevant code
      2. Review recent changes in affected areas
      3. Identify the likely root cause
      4. Provide fix recommendations with specific file references
    # Assistant automatically uses search_knowledge_base tool
    # to find relevant code in main-codebase-repo
    next:
      state_id: generate-fix-plan

  - id: generate-fix-plan
    assistant_id: code-expert
    task: |
      Based on the root cause analysis, create a detailed fix plan:
      - Files to modify
      - Code changes needed
      - Testing strategy
      - Potential side effects

      Root cause: {{task}}
    next:
      state_id: end

Example 2: Documentation-Powered Support Workflow

assistants:
  - id: support-agent
    model: gpt-4.1
    system_prompt: |
      You are a customer support specialist with access to all product documentation.
      Always cite specific documentation pages when answering questions.
      If information is not in the docs, clearly state that.
    datasource_ids:
      - product-documentation # User guides, API docs
      - confluence-kb # Internal knowledge base
      - faq-database # Frequently asked questions
    temperature: 0.5

states:
  - id: answer-question
    assistant_id: support-agent
    task: |
      Answer this customer question using documentation:
      {{customer_question}}

      Provide:
      1. Direct answer to the question
      2. References to relevant documentation
      3. Step-by-step instructions if applicable
      4. Links to additional resources
    # Assistant searches across all 3 datasources
    # Retrieves relevant documentation
    # Provides grounded, accurate answers
    next:
      state_id: end

Example 3: Multi-Source Research Workflow

assistants:
  - id: research-analyst
    model: gpt-4.1
    system_prompt: |
      You are a research analyst combining information from multiple sources.
      Synthesize information and identify patterns across different data sources.
    datasource_ids:
      - jira-project-tickets # Project management data
      - confluence-requirements # Requirements documentation
      - code-repository # Implementation code
      - google-drive-specs # Technical specifications
    limit_tool_output_tokens: 15000 # Large datasource results

states:
  - id: gather-project-context
    assistant_id: research-analyst
    task: |
      Research project {{project_name}} across all available sources:

      1. Find all related Jira tickets (features, bugs, tasks)
      2. Locate requirements documentation in Confluence
      3. Identify implemented code in the repository
      4. Review technical specs from Google Drive

      Provide a comprehensive project overview with references.
    # Assistant automatically searches all 4 datasources
    # Combines information from different sources
    # Creates unified view
    next:
      state_id: analyze-status

  - id: analyze-status
    assistant_id: research-analyst
    task: |
      Based on the gathered information, analyze project status:

      - Completion percentage
      - Open vs. closed tickets
      - Code coverage of requirements
      - Gaps in implementation
      - Risk areas

      Context: {{task}}
    next:
      state_id: generate-report

Example 4: Context-Aware Code Generation

assistants:
  - id: code-generator
    model: gpt-4.1
    system_prompt: |
      You are an expert code generator with knowledge of the existing codebase.
      Always follow existing patterns, naming conventions, and architectural styles.
      Reference similar existing code when generating new code.
    datasource_ids:
      - codebase-main
      - codebase-tests
      - coding-standards-doc
    temperature: 0.3

states:
  - id: research-existing-patterns
    assistant_id: code-generator
    task: |
      Research existing code patterns for: {{feature_type}}

      Find:
      1. Similar existing implementations
      2. Commonly used libraries/frameworks
      3. Architectural patterns in use
      4. Testing patterns from test codebase
      5. Relevant coding standards
    # Searches across codebase and standards
    next:
      state_id: generate-code

  - id: generate-code
    assistant_id: code-generator
    task: |
      Generate production-ready code for: {{feature_description}}

      Follow these existing patterns: {{task}}

      Requirements:
      - Match existing code style
      - Use same libraries as similar features
      - Include comprehensive tests
      - Add appropriate error handling
    next:
      state_id: end

Example 5: Selective Datasource Access

assistants:
  # Different assistants with different datasource access
  - id: public-doc-assistant
    model: gpt-4.1-mini
    system_prompt: Handle public documentation queries
    datasource_ids:
      - public-docs
      - public-api-docs
    # Only sees public information

  - id: internal-assistant
    model: gpt-4.1
    system_prompt: Handle internal team queries
    datasource_ids:
      - internal-confluence
      - internal-jira
      - internal-code-repos
    # Sees sensitive internal information

states:
  - id: route-query
    assistant_id: classifier
    task: |
      Classify query: {{user_query}}
      Is this public or internal information?
    output_schema: |
      {
        "type": "object",
        "properties": {
          "query_type": {"type": "string", "enum": ["public", "internal"]}
        }
      }
    next:
      condition:
        expression: "query_type == 'public'"
        then: handle-public
        otherwise: handle-internal

  - id: handle-public
    assistant_id: public-doc-assistant
    task: 'Answer query: {{user_query}}'
    # Uses only public datasources
    next:
      state_id: end

  - id: handle-internal
    assistant_id: internal-assistant
    task: 'Answer query: {{user_query}}'
    # Uses internal datasources
    next:
      state_id: end

Data Source Best Practices

1. Limit datasources to relevant content:

# Good: Specific, relevant datasources
datasource_ids:
  - backend-codebase      # For backend questions
  - backend-api-docs      # Supporting docs

# Avoid: Too many unrelated datasources
datasource_ids:
  - backend-codebase
  - frontend-codebase
  - mobile-codebase
  - design-docs
  - marketing-materials  # Not relevant for technical questions

2. Use descriptive datasource names:

Clear naming helps with debugging and maintenance
Include scope: engineering-docs-2024, customer-facing-api-docs

3. Monitor search quality:

Review retrieved documents in workflow logs
Adjust datasource indexing if results are poor
Consider datasource size limits (very large repos may need filtering)

4. Combine with explicit instructions:

system_prompt: |
  Search the {{datasource}} when you need information about:
  - API endpoints and parameters
  - Code implementation details
  - Historical decisions and context

  Do NOT search when:
  - Answering general knowledge questions
  - Providing coding best practices (use your training)

5. Handle missing information gracefully:

system_prompt: |
  If you cannot find relevant information in the datasources, clearly state:
  "I searched the available documentation but couldn't find specific information about X."

  Do NOT make up information or hallucinate answers.

9.2 Tool Integration

CodeMie Workflows provides extensive built-in tools for cloud platforms, code manipulation, knowledge base integration, and custom plugin development. Tools extend assistant capabilities with specific actions and integrations.

Built-in Tool Categories

Cloud Platform Tools:

AWS: EC2, ECS, Lambda, S3, CloudFormation, RDS, DynamoDB
Azure: VMs, App Service, Functions, Storage, SQL Database
GCP: Compute Engine, Cloud Functions, Cloud Storage, BigQuery
Kubernetes: Deployments, Services, Pods, ConfigMaps

Code Tools:

Analysis: AST parsing, complexity analysis, security scanning
Manipulation: Code generation, refactoring, formatting
Testing: Test generation, coverage analysis
Documentation: Auto-documentation, API spec generation

Knowledge Base Tools:

Search: Semantic search, keyword search, filtered search
Retrieval: Document retrieval, code snippet extraction
Indexing: Add/update documents in knowledge bases

Integration Tools:

IDE Integration: File operations, navigation, code actions
NATS Plugin System: Custom tool development via message broker
HTTP Tools: REST API calls, webhooks, external service integration

Tool Integration Examples

Example 1: AWS Infrastructure Management Workflow

enable_summarization_node: false
recursion_limit: 25

assistants:
  - id: aws-ops-engineer
    model: gpt-4.1
    system_prompt: |
      You are an AWS operations engineer. Use AWS tools to manage infrastructure.
      Always verify current state before making changes.
      Provide detailed status updates.
    tools:
      - name: aws_ec2_describe_instances
        integration_alias: aws-prod
      - name: aws_ec2_start_instances
        integration_alias: aws-prod
      - name: aws_ec2_stop_instances
        integration_alias: aws-prod
      - name: aws_ecs_list_services
        integration_alias: aws-prod
      - name: aws_ecs_update_service
        integration_alias: aws-prod

states:
  - id: check-current-status
    assistant_id: aws-ops-engineer
    task: |
      Check the current status of EC2 instances in region {{aws_region}}.
      Filter by tag: Environment={{environment}}

      Provide:
      - Total instance count
      - Running vs stopped instances
      - Instance types and sizes
      - Any issues or anomalies
    # Assistant uses aws_ec2_describe_instances tool
    next:
      state_id: scale-decision

  - id: scale-decision
    assistant_id: aws-ops-engineer
    task: |
      Based on current status: {{task}}

      Determine if scaling is needed for target capacity: {{target_capacity}} instances.

      Return JSON: {
        "action": "scale_up|scale_down|no_change",
        "instances_to_start": [],
        "instances_to_stop": []
      }
    output_schema: |
      {
        "type": "object",
        "properties": {
          "action": {"type": "string"},
          "instances_to_start": {"type": "array"},
          "instances_to_stop": {"type": "array"}
        }
      }
    next:
      condition:
        expression: "action != 'no_change'"
        then: execute-scaling
        otherwise: end

  - id: execute-scaling
    assistant_id: aws-ops-engineer
    task: |
      Execute the scaling action: {{action}}

      Start instances: {{instances_to_start}}
      Stop instances: {{instances_to_stop}}

      Verify the changes and report final status.
    # Assistant uses aws_ec2_start_instances and aws_ec2_stop_instances
    next:
      state_id: end

Example 2: Code Analysis and Refactoring Workflow

assistants:
  - id: code-quality-expert
    model: gpt-4.1
    temperature: 0.3
    system_prompt: |
      You are a code quality expert. Use code analysis tools to identify issues.
      Provide specific, actionable refactoring recommendations.
    tools:
      - name: code_analyze_complexity
      - name: code_analyze_security
      - name: code_generate_tests
      - name: code_refactor_extract_method
      - name: code_format
    datasource_ids:
      - main-codebase
    mcp_servers:
      - name: mcp-server-filesystem
        description: File operations
        config:
          command: mcp-server-filesystem
          args:
            - '/workspace'

states:
  - id: analyze-file
    assistant_id: code-quality-expert
    task: |
      Analyze the code file: {{file_path}}

      Use tools to check:
      1. Cyclomatic complexity
      2. Security vulnerabilities
      3. Code smells
      4. Test coverage

      Provide a comprehensive report with severity ratings.
    # Uses: code_analyze_complexity, code_analyze_security
    next:
      state_id: generate-recommendations

  - id: generate-recommendations
    assistant_id: code-quality-expert
    task: |
      Based on analysis: {{task}}

      Generate specific refactoring recommendations:
      - High priority issues to fix
      - Suggested refactorings with code examples
      - Test cases to add
      - Security fixes required

      Rank by impact and effort.
    next:
      state_id: apply-auto-fixes

  - id: apply-auto-fixes
    assistant_id: code-quality-expert
    task: |
      Apply automatic fixes for:
      1. Code formatting
      2. Simple refactorings (extract method, rename variables)
      3. Generate missing tests

      Report what was changed.
    # Uses: code_format, code_refactor_extract_method, code_generate_tests
    next:
      state_id: end

Example 3: Multi-Cloud Resource Discovery

assistants:
  - id: cloud-auditor
    model: gpt-4.1
    system_prompt: |
      You are a cloud infrastructure auditor working across multiple cloud providers.
      Gather resource information and create comprehensive inventory reports.
    tools:
      # AWS tools
      - name: aws_ec2_describe_instances
        integration_alias: aws-prod
      - name: aws_s3_list_buckets
        integration_alias: aws-prod
      # Azure tools
      - name: azure_vm_list
        integration_alias: azure-prod
      - name: azure_storage_list_accounts
        integration_alias: azure-prod
      # GCP tools
      - name: gcp_compute_list_instances
        integration_alias: gcp-prod
      - name: gcp_storage_list_buckets
        integration_alias: gcp-prod

states:
  - id: discover-aws-resources
    assistant_id: cloud-auditor
    task: |
      Discover all AWS resources:
      - EC2 instances
      - S3 buckets

      Collect metadata: names, IDs, regions, tags, sizes.
    next:
      state_id: discover-azure-resources

  - id: discover-azure-resources
    assistant_id: cloud-auditor
    task: |
      Discover all Azure resources:
      - Virtual machines
      - Storage accounts

      Collect metadata: names, IDs, regions, resource groups.
    next:
      state_id: discover-gcp-resources

  - id: discover-gcp-resources
    assistant_id: cloud-auditor
    task: |
      Discover all GCP resources:
      - Compute instances
      - Storage buckets

      Collect metadata: names, IDs, zones, labels.
    next:
      state_id: generate-inventory-report

  - id: generate-inventory-report
    assistant_id: cloud-auditor
    task: |
      Create unified inventory report across all clouds:

      AWS: {{task from discover-aws-resources}}
      Azure: {{task from discover-azure-resources}}
      GCP: {{task from discover-gcp-resources}}

      Provide:
      - Total resource counts by type and provider
      - Cost estimates (if available)
      - Potential optimization opportunities
      - Security concerns
    next:
      state_id: end

Example 4: Knowledge Base Integration Workflow

assistants:
  - id: documentation-manager
    model: gpt-4.1
    system_prompt: |
      You manage documentation knowledge bases.
      Keep documentation up-to-date and well-organized.
    tools:
      - name: kb_search # Search knowledge base
      - name: kb_add_document # Add new documents
      - name: kb_update_document # Update existing docs
      - name: kb_delete_document # Remove outdated docs
    mcp_servers:
      - name: mcp-server-filesystem
        config:
          command: mcp-server-filesystem
          args:
            - '/docs'

states:
  - id: find-outdated-docs
    assistant_id: documentation-manager
    task: |
      Search knowledge base for documentation related to: {{topic}}

      Identify:
      - Outdated information (older than 6 months)
      - Conflicting information
      - Missing documentation
    # Uses kb_search tool
    next:
      state_id: generate-updates

  - id: generate-updates
    assistant_id: documentation-manager
    task: |
      Based on findings: {{task}}

      Generate updated documentation for {{topic}}.
      Include:
      - Current best practices
      - Updated code examples
      - Links to related documentation
    next:
      state_id: update-knowledge-base

  - id: update-knowledge-base
    assistant_id: documentation-manager
    task: |
      Update the knowledge base:

      1. Remove outdated documents (if any)
      2. Add new documentation: {{task}}
      3. Update indexes and cross-references

      Report what was changed.
    # Uses kb_update_document, kb_add_document
    next:
      state_id: end

Example 5: Custom Plugin Integration via NATS

assistants:
  - id: custom-integration-agent
    model: gpt-4.1
    system_prompt: |
      You integrate with custom internal tools via NATS plugin system.
    tools:
      - name: custom_salesforce_query # Custom plugin tool
        integration_alias: salesforce-api
      - name: custom_slack_notify # Custom plugin tool
        integration_alias: slack-webhook
      - name: custom_datadog_metrics # Custom plugin tool
        integration_alias: datadog-api

states:
  - id: query-salesforce
    assistant_id: custom-integration-agent
    task: |
      Query Salesforce for customer: {{customer_id}}

      Retrieve:
      - Account information
      - Recent opportunities
      - Support tickets
    # Uses custom_salesforce_query tool via NATS plugin
    next:
      state_id: analyze-customer-health

  - id: analyze-customer-health
    assistant_id: custom-integration-agent
    task: |
      Analyze customer health based on Salesforce data: {{task}}

      Calculate health score (0-100) based on:
      - Opportunity pipeline
      - Support ticket volume
      - Account age and size
    next:
      state_id: send-alert

  - id: send-alert
    assistant_id: custom-integration-agent
    task: |
      If health score < 60, send alert:

      1. Notify account team via Slack (custom_slack_notify)
      2. Log metric to Datadog (custom_datadog_metrics)
      3. Create follow-up task

      Health score: {{health_score}}
      Customer: {{customer_id}}
    # Uses custom plugin tools
    next:
      state_id: end

Tool Integration Best Practices

1. Use integration aliases for credentials:

tools:
  - name: aws_s3_upload
    integration_alias: aws-prod # Credentials managed centrally
  # Avoids hardcoding AWS keys in workflow

2. Limit tool access per assistant:

# Security: Only give necessary tools
assistants:
  - id: read-only-auditor
    tools:
      - name: aws_ec2_describe_instances # ✓ Read-only
      # NO write operations like aws_ec2_terminate_instances

3. Validate tool outputs:

states:
  - id: deploy-with-validation
    assistant_id: deployer
    task: 'Deploy service and verify health checks pass'
    retry_policy:
      max_attempts: 3 # Retry if deployment fails
    next:
      condition:
        expression: "deployment_status == 'healthy'"
        then: success
        otherwise: rollback

4. Use tool result JSON pointers:

tools:
  - id: large-api-response
    tool: http_request
    tool_result_json_pointer: /data/items # Extract only needed data
    # Reduces context size and token usage

5. Combine built-in and custom tools:

tools:
  - name: aws_s3_upload # Built-in AWS tool
  - name: custom_virus_scan # Custom plugin via NATS
  - name: slack_notify # Custom notification plugin

# Workflow: Upload file → Scan for viruses → Notify team

9.3 MCP (Model Context Protocol) Integration

MCP servers can be integrated with assistants to provide additional tools and capabilities. See Section 3.6 for complete MCP server configuration reference.

assistants:
  - id: assistant-1
    mcp_servers:
      - name: filesystem
        enabled: true
        config:
          command: npx
          args: ['-y', '@modelcontextprotocol/server-filesystem', '{{project_root_folder}}']

      - name: database
        enabled: true
        config:
          command: uvx
          args: ['mcp-server-postgres']
          env:
            DATABASE_URL: '{{db_connection_string}}'
        integration_alias: postgres-prod
        resolve_dynamic_values_in_arguments: true

For detailed MCP server configuration including HTTP servers, transport types, and all available options, refer to Section 3.6: MCP Server Configuration.

9. Integration Capabilities​

9.1 Data Source Integration​

How Data Source Integration Works​

Supported Data Sources:​

Data Source Integration Examples​

Data Source Best Practices​

9.2 Tool Integration​

Built-in Tool Categories​

Tool Integration Examples​

Tool Integration Best Practices​

9.3 MCP (Model Context Protocol) Integration​