refactor: inline skill-creator reference files into SKILL.md

This commit is contained in:
zhayujie
2026-03-09 12:02:52 +08:00
parent 3b8b5625f8
commit 3c6781d240
6 changed files with 14 additions and 667 deletions

View File

@@ -1,168 +0,0 @@
# OpenAI Image Vision - Usage Examples
## Setup
Set up your API credentials using the agent's env_config tool:
```bash
# Set your OpenAI API key
env_config(action="set", key="OPENAI_API_KEY", value="sk-your-api-key-here")
# Optional: Set custom API base URL (for proxy or compatible services)
env_config(action="set", key="OPENAI_API_BASE", value="https://api.openai.com/v1")
```
## Example 1: Analyze a Local Image
```bash
bash scripts/vision.sh "/path/to/photo.jpg" "What's in this image?"
```
**Expected Output:**
```json
{
"model": "gpt-4.1-mini",
"content": "The image shows a beautiful landscape with mountains in the background and a lake in the foreground. The sky is clear with some clouds, and there are trees along the shoreline.",
"usage": {
"prompt_tokens": 1234,
"completion_tokens": 45,
"total_tokens": 1279
}
}
```
## Example 2: Analyze an Image from URL
```bash
bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image in detail"
```
## Example 3: Extract Text (OCR)
```bash
bash scripts/vision.sh "document.png" "Extract all text from this image"
```
**Use Case:** Extract text from screenshots, scanned documents, or photos of text.
## Example 4: Identify Objects
```bash
bash scripts/vision.sh "scene.jpg" "List all objects you can identify in this image"
```
## Example 5: Analyze Colors and Composition
```bash
bash scripts/vision.sh "artwork.jpg" "Describe the color palette and composition of this image"
```
## Example 6: Count Items
```bash
bash scripts/vision.sh "crowd.jpg" "How many people are in this image?"
```
## Example 7: Use Different Models
```bash
# Use gpt-4.1-mini (default, latest mini model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1-mini"
# Use gpt-4.1 (most capable, best for complex analysis)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1"
# Use gpt-4o-mini (previous mini model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4o-mini"
```
## Example 8: Complex Analysis
```bash
bash scripts/vision.sh "product.jpg" "Analyze this product image. Describe the product, its features, colors, and suggest what kind of marketing copy would work well for it."
```
## Example 9: Safety and Content Moderation
```bash
bash scripts/vision.sh "content.jpg" "Is there any inappropriate or unsafe content in this image?"
```
## Example 10: Technical Analysis
```bash
bash scripts/vision.sh "diagram.png" "Explain what this technical diagram represents and how it works"
```
## Integration with Agent
When the agent loads this skill, it will be available in the `<available_skills>` section. The agent can use it like:
```bash
bash "<base_dir>/scripts/vision.sh" "user_uploaded_image.jpg" "What's in this image?"
```
The `<base_dir>` will be automatically provided by the skill system.
## Error Handling Examples
### Missing API Key
```bash
$ bash scripts/vision.sh "image.jpg" "What is this?"
{"error": "OPENAI_API_KEY environment variable is not set", "help": "Visit https://platform.openai.com/api-keys to get an API key"}
```
### File Not Found
```bash
$ bash scripts/vision.sh "nonexistent.jpg" "What is this?"
{"error": "Image file not found", "path": "nonexistent.jpg"}
```
### Unsupported Format
```bash
$ bash scripts/vision.sh "file.bmp" "What is this?"
{"error": "Unsupported image format", "extension": "bmp", "supported": ["jpg", "jpeg", "png", "gif", "webp"]}
```
### Missing Parameters
```bash
$ bash scripts/vision.sh
{"error": "Image path or URL is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}
```
## Tips for Best Results
1. **Be Specific**: Ask clear, specific questions about what you want to know
2. **Image Quality**: Higher quality images generally produce better results
3. **Model Selection**:
- Use `gpt-4.1` for complex analysis requiring highest accuracy
- Use `gpt-4.1-mini` (default) for most tasks - latest mini model with good balance
4. **Text Extraction**: For OCR tasks, ensure text is clearly visible and not too small
5. **Multiple Aspects**: You can ask about multiple things in one question
6. **Context**: Provide context in your question if needed (e.g., "This is a medical scan, what do you see?")
## Performance Notes
- **Local Files**: Automatically base64-encoded, adds ~33% size overhead
- **URLs**: Passed directly to API, no encoding overhead
- **Timeout**: 60 seconds for API calls
- **Max Tokens**: 1000 tokens for responses (configurable in script)
- **Rate Limits**: Subject to your OpenAI API plan
## Supported Image Formats
✅ JPEG (`.jpg`, `.jpeg`)
✅ PNG (`.png`)
✅ GIF (`.gif`)
✅ WebP (`.webp`)
❌ BMP, TIFF, SVG, and other formats are not supported
## Cost Considerations
Vision API calls cost more than text-only calls because they include image tokens. Costs vary by:
- Model used (gpt-4.1 vs gpt-4.1-mini)
- Image size and resolution
- Length of response
Check OpenAI's pricing page for current rates: https://openai.com/pricing

View File

@@ -1,182 +0,0 @@
# OpenAI Image Vision Skill
This skill enables image analysis using OpenAI's Vision API (GPT-4 Vision models).
## Features
- ✅ Analyze images from local files or URLs
- ✅ Support for multiple image formats (JPEG, PNG, GIF, WebP)
- ✅ Automatic base64 encoding for local files
- ✅ Direct URL passing for remote images
- ✅ Configurable model selection
- ✅ Custom API base URL support
- ✅ Pure bash/curl implementation (no Python dependencies)
## Quick Start
1. **Set up API credentials using env_config:**
```bash
env_config(action="set", key="OPENAI_API_KEY", value="sk-your-api-key-here")
# Optional: custom API base
env_config(action="set", key="OPENAI_API_BASE", value="https://api.openai.com/v1")
```
2. **Analyze an image:**
```bash
bash scripts/vision.sh "/path/to/photo.jpg" "What's in this image?"
```
3. **Analyze from URL:**
```bash
bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image"
```
```bash
bash scripts/vision.sh "/path/to/image.jpg" "What's in this image?"
```
3. **Analyze from URL:**
```bash
bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image"
```
## Usage Examples
### Basic image analysis
```bash
bash scripts/vision.sh "photo.jpg" "What objects can you see?"
```
### Text extraction (OCR)
```bash
bash scripts/vision.sh "document.png" "Extract all text from this image"
```
### Detailed description
```bash
bash scripts/vision.sh "scene.jpg" "Describe this scene in detail, including colors, mood, and composition"
```
### Using different models
```bash
# Use gpt-4.1-mini (default, latest mini model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1-mini"
# Use gpt-4.1 (most capable, latest model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1"
# Use gpt-4o-mini (previous mini model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4o-mini"
```
## Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `OPENAI_API_KEY` | No* | - | OpenAI API key (preferred) |
| `OPENAI_API_BASE` | No | `https://api.openai.com/v1` | Custom OpenAI API base URL |
| `LINKAI_API_KEY` | No* | - | LinkAI API key (fallback when OPENAI_API_KEY is not set) |
| `LINKAI_API_BASE` | No | `https://api.link-ai.tech` | LinkAI API base URL |
\* At least one of `OPENAI_API_KEY` or `LINKAI_API_KEY` must be set. OpenAI takes priority when both are configured.
## Response Format
Success response:
```json
{
"model": "gpt-4.1-mini",
"content": "The image shows a beautiful sunset over mountains...",
"usage": {
"prompt_tokens": 1234,
"completion_tokens": 567,
"total_tokens": 1801
}
}
```
Error response:
```json
{
"error": "Error description",
"details": "Additional information"
}
```
## Supported Models
- `gpt-4.1-mini` (default) - Latest mini model, fast and cost-effective
- `gpt-4.1` - Latest GPT-4 variant, most capable
- `gpt-4o-mini` - Previous generation mini model
- `gpt-4-turbo` - Previous generation turbo model
## Supported Image Formats
- JPEG (`.jpg`, `.jpeg`)
- PNG (`.png`)
- GIF (`.gif`)
- WebP (`.webp`)
## Technical Details
- **Implementation**: Pure bash script using curl and base64
- **Timeout**: 60 seconds for API calls
- **Max tokens**: 1000 tokens for responses
- **Image handling**:
- Local files are automatically base64-encoded
- URLs are passed directly to the API
- MIME types are auto-detected from file extensions
## Error Handling
The script handles various error cases:
- Missing required parameters
- Missing API key
- File not found
- Unsupported image formats
- API errors
- Network timeouts
- Invalid JSON responses
## Integration with Agent System
When loaded by the agent system, this skill will appear in `<available_skills>` with a `<base_dir>` path. Use it like:
```bash
bash "<base_dir>/scripts/vision.sh" "image.jpg" "What's in this image?"
```
The agent will automatically:
- Load environment variables from `~/.cow/.env`
- Provide the correct `<base_dir>` path
- Handle skill discovery and registration
## Notes
- Images are sent to OpenAI's servers for processing
- Large images may be automatically resized by the API
- Rate limits depend on your OpenAI API plan
- Token usage includes both the image and text in the prompt
- Base64 encoding increases the size of local images by ~33%
## Troubleshooting
**"OPENAI_API_KEY environment variable is not set"**
- Set the environment variable using env_config tool
- Or use the agent's env_config tool
**"Image file not found"**
- Check the file path is correct
- Use absolute paths or paths relative to current directory
**"Unsupported image format"**
- Only JPEG, PNG, GIF, and WebP are supported
- Check the file extension matches the actual format
**"Failed to call OpenAI API"**
- Check your internet connection
- Verify the API key is valid
- Check if custom API base URL is correct
## License
Part of the chatgpt-on-wechat project.

View File

@@ -1,202 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -210,14 +210,23 @@ After initialization, customize the SKILL.md and add resources as needed. If you
When editing the (newly-generated or existing) skill, remember that the skill is being created for another instance of the agent to use. Include information that would be beneficial and non-obvious to the agent. Consider what procedural knowledge, domain-specific details, or reusable assets would help another agent instance execute these tasks more effectively.
#### Learn Proven Design Patterns
#### Design Patterns
Consult these helpful guides based on your skill's needs:
**Workflow patterns** — For complex tasks, break operations into sequential steps or conditional branches:
- **Multi-step processes**: See references/workflows.md for sequential workflows and conditional logic
- **Specific output formats or quality standards**: See references/output-patterns.md for template and example patterns
```markdown
# Sequential: list numbered steps with scripts
1. Analyze the form (run analyze_form.py)
2. Create field mapping (edit fields.json)
3. Fill the form (run fill_form.py)
These files contain established best practices for effective skill design.
# Conditional: guide through decision points
1. Determine the modification type:
**Creating new content?** → Follow "Creation workflow"
**Editing existing content?** → Follow "Editing workflow"
```
**Output patterns** — When consistent output format matters, provide a template or input/output examples in SKILL.md so the agent can follow the desired style.
#### Start with Reusable Skill Contents

View File

@@ -1,82 +0,0 @@
# Output Patterns
Use these patterns when skills need to produce consistent, high-quality output.
## Template Pattern
Provide templates for output format. Match the level of strictness to your needs.
**For strict requirements (like API responses or data formats):**
```markdown
## Report structure
ALWAYS use this exact template structure:
# [Analysis Title]
## Executive summary
[One-paragraph overview of key findings]
## Key findings
- Finding 1 with supporting data
- Finding 2 with supporting data
- Finding 3 with supporting data
## Recommendations
1. Specific actionable recommendation
2. Specific actionable recommendation
```
**For flexible guidance (when adaptation is useful):**
```markdown
## Report structure
Here is a sensible default format, but use your best judgment:
# [Analysis Title]
## Executive summary
[Overview]
## Key findings
[Adapt sections based on what you discover]
## Recommendations
[Tailor to the specific context]
Adjust sections as needed for the specific analysis type.
```
## Examples Pattern
For skills where output quality depends on seeing examples, provide input/output pairs:
```markdown
## Commit message format
Generate commit messages following these examples:
**Example 1:**
Input: Added user authentication with JWT tokens
Output:
```
feat(auth): implement JWT-based authentication
Add login endpoint and token validation middleware
```
**Example 2:**
Input: Fixed bug where dates displayed incorrectly in reports
Output:
```
fix(reports): correct date formatting in timezone conversion
Use UTC timestamps consistently across report generation
```
Follow this style: type(scope): brief description, then detailed explanation.
```
Examples help Claude understand the desired style and level of detail more clearly than descriptions alone.

View File

@@ -1,28 +0,0 @@
# Workflow Patterns
## Sequential Workflows
For complex tasks, break operations into clear, sequential steps. It is often helpful to give Claude an overview of the process towards the beginning of SKILL.md:
```markdown
Filling a PDF form involves these steps:
1. Analyze the form (run analyze_form.py)
2. Create field mapping (edit fields.json)
3. Validate mapping (run validate_fields.py)
4. Fill the form (run fill_form.py)
5. Verify output (run verify_output.py)
```
## Conditional Workflows
For tasks with branching logic, guide Claude through decision points:
```markdown
1. Determine the modification type:
**Creating new content?** → Follow "Creation workflow" below
**Editing existing content?** → Follow "Editing workflow" below
2. Creation workflow: [steps]
3. Editing workflow: [steps]
```