Gå til hovedindhold

Appendix: Assessing trends in the Danish Labour Market with DeiC Interactive HPC

Code example 1

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset, DatasetDict

# Load dataset and tokenizer
dataset = load_dataset("csv", data_files={"train": "job_postings.csv"})
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Preprocessing function
def preprocess(batch):
    return tokenizer(batch['text'], padding='max_length', truncation=True)

# Apply preprocessing in batches
tokenized_dataset = dataset.map(preprocess, batched=True)

# Split dataset into batches for distributed processing
batch_size = 32  # Adjust based on GPU memory
train_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="steps",
    save_strategy="epoch",
    per_device_train_batch_size=batch_size,
    gradient_accumulation_steps=2,  # To simulate larger batch size
    num_train_epochs=3,
    logging_dir="./logs",
    logging_steps=10,
    dataloader_num_workers=4,  # Optimize data loading
    fp16=True  # Enable mixed precision for speed
)

# Load BERT model
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")

# Trainer
trainer = Trainer(
    model=model,
    args=train_args,
    train_dataset=tokenized_dataset['train']
)

# Start training
trainer.train()

Code example 2

# Prompt Strategy 2: Chain of Thought
custom_prompt2 = """As an expert in job market analysis, carefully examine the following job advertisement. Your task is to identify and extract key skills and qualifications mentioned in the ad. 
Once identified, categorize these skills into the appropriate categories from the list provided: 'CRM','Computer Support and Networking','Data Analysis',
'Digital Design,'Digital Marketing','Machining and Manufacturing Technology','Productivity','Programming and Software Development', 'Character','Cognitive','Customerservice','Financial', 'Management','Social','Writing_language'. For each identified skill or qualification, provide a brief explanation of why it fits into its respective category. Your analysis should be detailed and precise to ensure accuracy in categorization. Format your response as a structured JSON object with each category as a key. Under each key, list the identified skills along with a brief explanation for each. If a skill does not fit any of the listed categories, classify it under 'Other'. Ensure your response is well-organized and easy to understand. Here's an example structure for your JSON output. Now job posting begins:"""

Few-shots Learning
# Prompt Strategy 4: Few Shot Learning
custom_prompt4 = """
Example 1:
Job Advertisement: "Seeking a data analyst with experience in SQL, Python, and data visualization. Must possess strong analytical skills and be familiar with machine learning techniques."
Extracted Skills:
  "Data Analysis": ["SQL", "Python”, "Data visualization”,”Machine learning techniques”],
  "Cognitive": ["Analytical skills"]
Example 2:
Job Advertisement: "Digital marketing specialist required with expertise in social media advertising, content creation, and SEO. Must have excellent writing skills and be a creative thinker."
Extracted Skills:
  "Digital Marketing": [“social media advertising", "SEO"],
  "Writing_language": [“content creation", "Excellent writing skills"],
  “Cognitive: [“creative thinker”]
Your task is to identify and extract key skills and qualifications mentioned in the ad. Once identified, categorize these skills into the appropriate categories from the list provided: 'CRM','Computer Support and Networking','Data Analysis','Digital Design,'Digital Marketing','Machining and Manufacturing Technology','Productivity','Programming and Software Development', 'Character','Cognitive','Customerservice','Financial', 'Management','Social','Writing_language'
For each identified skill or qualification, provide a brief explanation of why it fits into its respective category. Your analysis should be detailed and precise to ensure accuracy in categorization.
Format your response as a structured JSON object with each category as a key. Under each key, list the identified skills along with a brief explanation for each. If a skill does not fit any of the listed categories, classify it under 'Other'. Ensure your response is well-organized and easy to understand.
Now, analyze the following job advertisement:
""".strip()
Revideret
07 jan 2025